-
Notifications
You must be signed in to change notification settings - Fork 194
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add NwbTimeSeriesExtractor
to load non-electrical series data from NWB
#3587
Add NwbTimeSeriesExtractor
to load non-electrical series data from NWB
#3587
Conversation
NwbTimeSeriesExtractor
to extract TimeSeries data from NWBNwbTimeSeriesExtractor
to load non-electrical series data from NWB
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a couple cosmetic things.
and returns a list with their paths. | ||
""" | ||
if backend == "hdf5": | ||
import h5py |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know how big h5py is or zarr for that matter. Is there any benefit to just import the exact thing we need (ie import h5py.Group rather than the full package?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most packages do not keep the type of discipline that you would need for this to make a difference. I venture to guess that it would not make a difference in this case but I don't have time to profile.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I honestly just didn't know the size of these packages. Something like scipy would be so slow and heavy for this kind of check, but if these are rather small/fast (which has been my experience playing with h5py at least) then it doesn't matter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, what I mean is that usually both of them are equivalent because importing one attribute imports the full module in most python packages. Anyway, here are the measurements:
@h-laptop$ conda activate work
(work) @h-laptop$ python -m timeit -s "import h5py"
50000000 loops, best of 5: 5.43 nsec per loop
(work) @h-laptop$ python -m timeit -s "from h5py import Group"
50000000 loops, best of 5: 6.33 nsec per loop
(work) @h-laptop$ python -m timeit -s "from zarr import Group"
50000000 loops, best of 5: 5.39 nsec per loop
(work) @h-laptop$ python -m timeit -s "import zarr"
50000000 loops, best of 5: 5.31 nsec per loop
If True, the time vector is loaded into the recording object. Useful when | ||
precise timing information is needed. | ||
samples_for_rate_estimation : int, default: 1000 | ||
The number of timestamp samples used for estimating the sampling rate when |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm trying to understand the writing. This would just be the number of timestamps right? I'm not quite sure what it would mean to say timestamp samples? But I don't know this format so I could be totally off on this :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, confusing, the thing is that the object/data container is called timestamps. That time series (timestamps) has samples.
Let me think this through.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I welcome any suggestion though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I would need to see an example. Maybe I'll look at the test and see if I can better understand how this works. If it is confusing naming then we just have to live with it.
self.timeseries_path = list(time_series_dict.keys())[0] | ||
else: | ||
raise ValueError( | ||
f"Multiple TimeSeries found! Specify 'timeseries_path'. Options: {list(time_series_dict.keys())}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
:)
Co-authored-by: Zach McKenzie <[email protected]>
Can I merge this? anyone else comments? |
LGTM |
Some users would like to access data from NWB that is not an ElectricalSeries to analyze it with SpikeInterface. TimeSeries data can be loaded lazily to Spikeinterface but it does not have all the ElectricalSeries infrastructure like electrodes so it can not be loaded by the current extractor. This PR adds another extractor that can be used to load TimeSeries data.
@steevelaquitaine @borrepp